Introduction to the series: “Visualisation of My Personal Google Data”
If you give permission to Google to store your location data, they will keep them in their databases forever. You can also allow them to store it for a while and then ask them delete it. They will directly do so.
What makes this study a fun project is its being very personal. I decided to analyze my personal data in August, 2022. Therefore, I granted many new permissions to Google along with many previously granted permissions. They keep them in various formats including .csv, .json, .mbox etc. When you query for your personal data, they provide it within a couple of days depending on the size of the data you queried.
Usually, I provide the readers with the data in my posts. However, in this series, the data are very personal and so I will not.
Introduction: “My exercise routine”
In this part of the series, we will investigate my personal location data. We will visualize the spots I visited within a period of time. This way, I personally will gain insights about how boring my days are :)
The R packages that we use in this post are as follows:
Understand the Data & Pre-processing
Let’s start the procedure by reading the data into R. The head()
function provides us with the first six observations of the data frame. There are 16 variables some of which are filled with many many NA
s. such as Biking.duration..ms.
The reason for that is pretty simple; I do not ride around. The variables are mostly numeric, yet Date
gives us information about the date of the observation. The variable Running.duration..ms.
has many NA
s too. But it also has some observations because I run only a few days a week. I will use this variable to filter my data later.
Date Move.Minutes.count Calories..kcal. Distance..m. Heart.Points
1 2021-05-16 77 687.6241 4090.742 40
2 2021-05-17 79 2016.4717 3978.947 13
3 2021-05-18 31 1752.2711 2267.061 21
4 2021-05-19 140 2201.5012 9714.473 74
5 2021-05-20 50 1842.7984 2618.885 16
6 2021-05-21 63 1902.6128 4571.006 43
Heart.Minutes Average.speed..m.s. Max.speed..m.s. Min.speed..m.s. Step.count
1 40 0.9532187 1.017549 0.8888889 7654
2 13 0.3553377 1.328190 0.2391036 6161
3 21 0.4802522 1.643308 0.2550294 3450
4 74 0.5155224 1.711053 0.2550294 14000
5 16 0.3804749 1.516558 0.2588491 4078
6 43 0.5348663 1.492037 0.2656556 6768
Average.weight..kg. Max.weight..kg. Min.weight..kg. Biking.duration..ms.
1 80 80 80 NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
Walking.duration..ms. Running.duration..ms.
1 4664026 NA
2 4692410 424601
3 1610096 NA
4 8268890 93935
5 2994689 NA
6 3909532 NA
As the data contains some variables with decimal numbers, I would love to round them to increase meaningfulness. The mutate_if
function combined with is.numeric
gives me the opportunity to pick only the numeric variables to round.
So far so good. Now I want to select the variables that I want to visualize in my project and filter my data accordingly. Let’s see the variable names:
colnames(daily_metrics)
[1] "Date" "Move.Minutes.count" "Calories..kcal."
[4] "Distance..m." "Heart.Points" "Heart.Minutes"
[7] "Average.speed..m.s." "Max.speed..m.s." "Min.speed..m.s."
[10] "Step.count" "Average.weight..kg." "Max.weight..kg."
[13] "Min.weight..kg." "Biking.duration..ms." "Walking.duration..ms."
[16] "Running.duration..ms."
In this project, I will use following variables: date
, Move.Minutes.count
,Calories..kcal.
,Distance..m.
,Average.speed..m.s.
,Biking.duration..ms.
,Step.caoun
and walking.duration.ms.
. Let’s use subset
function to get rid of the unnecessary part of the data.
daily_metrics <- subset(daily_metrics, select = -c(5,6,8,9,14, 13,12,11) ) #idk why so randomly ordered numbers
head(daily_metrics)
Date Move.Minutes.count Calories..kcal. Distance..m.
1 2021-05-16 77 687.62 4090.74
2 2021-05-17 79 2016.47 3978.95
3 2021-05-18 31 1752.27 2267.06
4 2021-05-19 140 2201.50 9714.47
5 2021-05-20 50 1842.80 2618.88
6 2021-05-21 63 1902.61 4571.01
Average.speed..m.s. Step.count Walking.duration..ms. Running.duration..ms.
1 0.95 7654 4664026 NA
2 0.36 6161 4692410 424601
3 0.48 3450 1610096 NA
4 0.52 14000 8268890 93935
5 0.38 4078 2994689 NA
6 0.53 6768 3909532 NA
As mentioned before, I would like to work on the days that I run. That’s why I will also get rid of observations that do not contain any running data and then my data will be ready for the visualization.
Finally, let’s get the scatter plots of whatever we have left to see a bigger picture. There are some outliers; you will easily recognize some bubles sitting alone in their plots. However, they don’t look like they need further investigation.
Data Visualization
plotly
is an amazing alternative to ggplot2.
Today we will work on this package with our data. Our first chart contains information about the number of steps I took each day that I ran from May 2021 to November 2022.
fig<-plotly::plot_ly(data=data_for_plotting, type ="scatter", mode="lines+markers",
y=data_for_plotting$Step.count, x=~Date)
fig
When you hover over the chart you might see the date and the number of steps that I took. For instance on 12th and 28th of August, I took more than 16k steps.
Now let’s add two dimensions to our chart: the duration of walking and running in a day using the addtrace
function. Also layout
function helps us name our plot.
########################
# walking VS runing duration bars
#########################
ay<- list(
tickfont =list(color ="red"),
overlaying = "y",
side= "right",
title= "<b> secondary</b> y axis"
)
fig<-plotly::plot_ly()
fig<- fig %>%
add_trace(type ="bar",
y =data_for_plotting$Walking.duration..ms.,
x=data_for_plotting$Date,
name="walking duration (ms)"
)
fig<- fig %>% add_trace (type ="scatter", mode="lines+markers" , #yaxis="y2",
y=data_for_plotting$Running.duration..ms.,
x=data_for_plotting$Date ,
name="running duration (ms)")
fig <- fig %>% layout(
title="two axis",
yaxis2 = ay,
xaxis = list( title= "Days"),
yaxis = list( title= "Time I moved in miliseconds")
)
fig
This chart brings another aspect of my exercise routine. The days I ran in the last year, I walked a lot more than I ran, obviously. One exception to that might be 25th of July when I almost walked and ran same amount of time.
Now, besides time, let’s add another dimension of distance. This chart will have another y axis. We need to let plotly know that we will use another dimension. overlaying = "y"
argument will do it. Also, we use another addtrace
function to add the second y axis properties. yaxis="y2"
argument will link this trace to our new dimension. y2
here is defined by plotly
as default setting.
ay<- list(
tickfont =list(color ="red"),
overlaying = "y",
side= "right",
title= "<b> Distance I walked in meters </b>"
)
fig<-plotly::plot_ly()
fig<- fig %>%
add_trace(y =data_for_plotting$Move.Minutes.count,
x=data_for_plotting$Date,
name="movemnt(min.)",
type ="bar" )
fig<- fig %>% add_trace (y=data_for_plotting$Distance..m.,
x=data_for_plotting$Date ,
name="distance(m.)", yaxis="y2",
type ="scatter", mode="lines+markers")
fig <- fig %>% layout(
title="My movement VS distance <b>in 2022</b>",
yaxis2 = ay,
xaxis = list( title= "<b>Days</b>"),
yaxis = list( title= "<b> Time I moved in minutes </b> ")
)
fig
As the final result, you can see that on the first y axis there are the number of minutes (from 0 to 200) when I was in a moving state. On the other side, we have the distance I walked or run in meters (from 0 to 10k). For example; on sixth of Auust in 2022, I walked almost 7k meters in 150 minutes.
As a final step, let’s see the calories I burnt during that time. Another addtrace
would help us draw the line for the calories that I burnt throughout these days:
fig<-plotly::plot_ly()
fig<- fig %>%
add_trace(y =data_for_plotting$Step.count,
x=data_for_plotting$Date,
name="Steps ",
type ="scatter", mode="lines")
fig<- fig %>% add_trace (y=data_for_plotting$Distance..m.,
x=data_for_plotting$Date ,
name="distance",
type ="bar" )
fig<- fig %>% add_trace (y=data_for_plotting$Calories..kcal.,
x=data_for_plotting$Date ,
name="calories",
type ="scatter", mode="lines")
fig <- fig %>% layout(
title=" <b>My movement in 2022</b>",
xaxis = list( title= "<b>Days</b>"),
yaxis = list( title= "<b> steps(n)</b> / <b> distance(m)</b> / <b> calories(kcal) </b> ")
)
fig
The reason why there are not many diversity in the calories that I burnt might be because of two reasons: either the calories are not counted correctly by Google, which I believe is the case, or as the charts contain information for only the days that I run, I spent similar amounts of calories. Either way, we can see that the calories don’t change much even though took much more steps and spent more time moving on some days than some other days.
Conclusion
The plotly package have many useful embedded features. Especially the hovering property of the charts make it much more appealing easily. Another beautiful feature of plotly
is that you can use certain HTML tags on your charts. For example here in the last chart, we used <b> bolt </b>
tag in order for making our titles stand out in the layout
function.
Finally I must also mention that plotly have some features to create different types of graphs, animations and many other. Their website for R worths a visit.